Jan-Philipp Kolb
8 Mai 2017
Modularer Aufbau
Import
Wie wird das Github Verzeichnis genutzt?
https://github.com/Japhilko/RInterfaces
Es lohnt sich immer wieder zu dieser Seite zurückzukehren, weil hier alle relevanten Dokumente verlinkt sind.
Grundsätzlich kann man der Veranstalung am Besten mit dem kompletten File folgen. Wenn Teile heruntergeladen werden sollen, bietet es sich an, das entsprechende pdf herunterzuladen.
Zum Ausdrucken eignen sich die pdf-Dateien besser.
Diese können mit dem Raw Button heruntergeladen werden.
Raw Button zum Download
Begleitend zu den Folien wird meistens auch ein R-File angeboten.
Hier können Sie entweder das gesamte R-File herunterladen und in R ausführen oder einzelne Befehle per Copy/Paste übernehmen.
Vereinzelt sind auch Datensätze vorhanden.
.csv Dateien können direkt von R eingelesen werden (wie das geht werde ich noch zeigen).
Wenn die .csv Dateien heruntergeladen werden sollen auch den Raw Button verwenden.
Alle anderen Dateien (bspw. .RData) auch mittels Raw Button herunterladen.
Wen Github näher interessiert:
Gehen Sie auf https://cran.r-project.org/ und suchen Sie in dem Bereich, wo die Pakete vorgestellt werden, nach Paketen,…
R unterstützt von Haus aus schon einige wichtige Formate:
read.csv()read.fwf()read.delim()Import Button
https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD
So findet man heraus, in welchem Verzeichnis man sich gerade befindet
getwd()So kann man das Arbeitsverzeichnis ändern:
Man erzeugt ein Objekt in dem man den Pfad abspeichert:
main.path <- "C:/" # Beispiel für Windows
main.path <- "/users/Name/" # Beispiel für Mac
main.path <- "/home/user/" # Beispiel für LinuxUnd ändert dann den Pfad mit setwd()
setwd(main.path)Bei Windows ist es wichtig Slashs anstelle von Backslashs zu verwenden.
readrinstall.packages("readr")library(readr)library(readr) ist für den Import von fremden Datenformaten hilfreichforeign einlesen.library(readr)
rows <- read_csv("https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD").csv-Daten aus dem Web importieren - zweites Beispielurl <- "https://raw.githubusercontent.com/Japhilko/
GeoData/master/2015/data/whcSites.csv"
whcSites <- read.csv(url) head(data.frame(whcSites$name_en,whcSites$category))## whcSites.name_en
## 1 Cultural Landscape and Archaeological Remains of the Bamiyan Valley
## 2 Minaret and Archaeological Remains of Jam
## 3 Historic Centres of Berat and Gjirokastra
## 4 Butrint
## 5 Al Qal'a of Beni Hammad
## 6 M'Zab Valley
## whcSites.category
## 1 Cultural
## 2 Cultural
## 3 Cultural
## 4 Cultural
## 5 Cultural
## 6 Cultural
haveninstall.packages("haven")library(haven)install.packages("haven")library(haven)
mtcars <- read_sav("https://github.com/Japhilko/RInterfaces/raw/master/data/mtcars.sav")library(haven)
oecd <- read_dta("https://github.com/Japhilko/IntroR/raw/master/2017/data/oecd.dta")read.X() Funktionen stehen viele write.X() Funktionen zur Verfügung.RData)A <- c(1,2,3,4)
B <- c("A","B","C","D")
mydata <- data.frame(A,B).RData Format am Besten:save(mydata, file="mydata.RData").csv Format abspeichernwrite.csv(mydata,file="mydata.csv") write.csv2 besserwrite.csv2(mydata,file="mydata.csv") xlsxlibrary(xlsx)
write.xlsx(mydata,file="mydata.xlsx") foreignforeignlibrary(foreign)
write.dta(mydata,file="data/mydata.dta") rioinstall.packages("rio")library("rio")
# create file to convert
export(mtcars, "data/mtcars.sav")export(mtcars, "data/mtcars.dta")
# convert Stata to SPSS
convert("data/mtcars.dta", "data/mtcars.sav")Quick R für das Exportieren von Daten:
Hilfe zum Export auf dem cran Server
xlsxlibrary("xlsx")
dat <- read.xlsx("cult_emp_sex.xls",1)install.packages("XLConnect")library("XLConnect")fileXls <- "data/newFile.xlsx"
unlink(fileXls, recursive = FALSE, force = FALSE)
exc <- loadWorkbook(fileXls, create = TRUE)
createSheet(exc,'Input')
saveWorkbook(exc)input <- data.frame('inputType'=c('Day','Month'),'inputValue'=c(2,5))
writeWorksheet(exc, input, sheet = "input", startRow = 1, startCol = 2)
saveWorkbook(exc)myFunction <- function(){
aa <- rnorm(200)
bb <- rnorm(200)
res <- lm(aa~bb)$res
return(res)
}BERTinstall.packages("readxl")library(readxl)Markdown ist eine sehr einfache Syntax, die es Benutzern erlaubt, aus einfachen Textdateien gut gelayoutete Dokumente zu erstellen.
**fettes Beispiel**
*kursives Beispiel*
~~durchgestrichen~~
- Aufzählungspunkt
fettes Beispiel
kursives Beispiel
durchgestrichen
### Überschrift Ebene 3
#### Überschrift Ebene 4
[Meine Github Seite](https://github.com/Japhilko)


n=100Ein inline Codeblock: 100
| Argument | Beschreibung |
|---|---|
| eval | Soll Rcode evaluiert werden? |
| warning | Sollen Warnings angezeigt werden? |
| cache | Soll der Output gespeichert werden? |
knitrinstall.packages("knitr")library("knitr")kable um Tabellen zu erzeugenkable erzeugena <- runif(10)
b <- rnorm(10)
ab <- cbind(a,b)
kable(ab)| a | b |
|---|---|
| 0.6034022 | -0.2446712 |
| 0.8620643 | 0.5327476 |
| 0.9971391 | 1.1225467 |
| 0.8382172 | 1.6393233 |
| 0.7787951 | 0.3268743 |
| 0.6682560 | -0.3171533 |
| 0.9880234 | -0.0831934 |
| 0.1525396 | 1.7129532 |
| 0.8614273 | -1.0471565 |
| 0.4226663 | -1.1787396 |
date: "06 Mai, 2017"
cache=T angegeben ist, wird das Ergebnis des Chunks abgespeichert---
title: "Intro - Erste Schritte"
author: "Jan-Philipp Kolb"
date: "10 April 2017"
output:
beamer_presentation:
colortheme: beaver
theme: CambridgeUS
---
output:
beamer_presentation:
toc: yes
\Sexpr{}
citation() bekommt man sehr schnell die Referenzinstall.packages("RMySQL")citation("RMySQL")##
## To cite package 'RMySQL' in publications use:
##
## Jeroen Ooms, David James, Saikat DebRoy, Hadley Wickham and
## Jeffrey Horner (2017). RMySQL: Database Interface and 'MySQL'
## Driver for R. R package version 0.10.11.
## https://CRAN.R-project.org/package=RMySQL
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {RMySQL: Database Interface and 'MySQL' Driver for R},
## author = {Jeroen Ooms and David James and Saikat DebRoy and Hadley Wickham and Jeffrey Horner},
## year = {2017},
## note = {R package version 0.10.11},
## url = {https://CRAN.R-project.org/package=RMySQL},
## }
---
title: "R Schnittstellen"
author: "Jan-Philipp Kolb"
date: "21 April 2017"
output:
pdf_document: default
bibliography: Rschnittstellen.bib
---
date()## [1] "Sat May 06 22:11:57 2017"
$$
\begin{equation}\label{eq2}
t_{i}=\sum\limits_{k=1}^{M_{i}}{y_{ik}}=M_{i}\bar{Y}_{i}.
\end{equation}
$$
Folie mit zwei Spalten
====================================
Erste Spalte
***
Zweite Spalte
transition: rotate
Ein neues Kapitel einfügen
====================================
type: section
Anderer Folientyp
====================================
type: prompt
Noch ein anderer Folientyp
====================================
type: alert
Meine Präsentation
========================================
author: Jan-Philipp Kolb
font-family: 'Impact'
Meine Präsentation
========================================
author: Jan-Philipp Kolb
font-import: http://fonts.googleapis.com/css?family=Risque
font-family: 'Risque'
Normale Schriftgröße
<small>This sentence will appear smaller.</small>
http://rpubs.com/Japhilko82/FirstRpubs

---
title: "ioslides Beispiel"
author: "Jan-Philipp Kolb"
date: "20 April 2017"
output:
ioslides_presentation:
logo: figure/Rlogo.png
---
library(knitr)
a <- data.frame(a=1:10,b=10:1)
kable(table(a))| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
knitr EnginesUm den Präsentationstyp zu ändern kann man das CSS verändern
install.packages("rticles")Das Paket rmdformats - HTML Output Formats and Templates for ‘rmarkdown’
install.packages("rmdformats")ProjectTemplate - Automates the Creation of New Statistical Analysisinstall.packages("ProjectTemplate")tufte - Tufte’s Styles for R Markdown Documentsinstall.packages("tufte")install.packages("flexdashboard", type = "source")import sys
print(sys.version)## 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)]
cmd in Suche eingibt.jupyter notebook
beaker.command.bat startenurl <- "https://raw.githubusercontent.com/Japhilko/
GeoData/master/2015/data/whcSites.csv"
whcSites <- read.csv(url) whcSitesDat <- with(whcSites,data.frame(name_en,
category))library(knitr)
kable(head(whcSitesDat))| name_en | category |
|---|---|
| Cultural Landscape and Archaeological Remains of the Bamiyan Valley | Cultural |
| Minaret and Archaeological Remains of Jam | Cultural |
| Historic Centres of Berat and Gjirokastra | Cultural |
| Butrint | Cultural |
| Al Qal’a of Beni Hammad | Cultural |
| M’Zab Valley | Cultural |
DTinstall.packages("DT")whcSitesDat2 <- with(whcSites,data.frame(name_en,category,longitude,latitude,date_inscribed,area_hectares,danger_list))datatable kann man eine erste interaktive Tabelle erstellen:library('DT')
datatable(whcSitesDat2)http://rpubs.com/Japhilko82/WHCdata
magrittrinstall.packages("magrittr")library("magrittr")library(magrittr)
str1 <- "Hallo Welt"
str1 %>% substr(1,5)## [1] "Hallo"
str1 %>% substr(1,5) %>% toupper()## [1] "HALLO"
leafletinstall.packages("leaflet")library("leaflet")m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=whcSites$lon,
lat=whcSites$lat,
popup=whcSites$name_en)
mwhcSites$color <- "red"
whcSites$color[whcSites$category=="Cultural"] <- "blue"
whcSites$color[whcSites$category=="Mixed"] <- "orange"m1 <- leaflet() %>%
addTiles() %>%
addCircles(lng=whcSites$lon,
lat=whcSites$lat,
popup=whcSites$name_en,
color=whcSites$color)Weltkulturerbe
m2 <- leaflet() %>%
addTiles(group = "OSM (default)") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%
addCircles(lng=whcSites$lon,
lat=whcSites$lat,
popup=whcSites$name_en) %>%
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
options = layersControlOptions(collapsed = FALSE)
)
m2outline <- quakes[chull(quakes$long, quakes$lat),]map <- leaflet(quakes) %>%
# Base groups
addTiles(group = "OSM (default)") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%
# Overlay groups
addCircles(~long, ~lat, ~10^mag/5, stroke = F, group = "Quakes") %>%
addPolygons(data = outline, lng = ~long, lat = ~lat,
fill = F, weight = 2, color = "#FFFFCC", group = "Outline") %>%
# Layers control
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
overlayGroups = c("Quakes", "Outline"),
options = layersControlOptions(collapsed = FALSE)
)
maplibrary(sp)
Sr1 = Polygon(cbind(c(2, 4, 4, 1, 2), c(2, 3, 5, 4, 2)))
Sr2 = Polygon(cbind(c(5, 4, 2, 5), c(2, 3, 2, 2)))
Sr3 = Polygon(cbind(c(4, 4, 5, 10, 4), c(5, 3, 2, 5, 5)))
Sr4 = Polygon(cbind(c(5, 6, 6, 5, 5), c(4, 4, 3, 3, 4)), hole = TRUE)
Srs1 = Polygons(list(Sr1), "s1")
Srs2 = Polygons(list(Sr2), "s2")
Srs3 = Polygons(list(Sr4, Sr3), "s3/4")
SpP = SpatialPolygons(list(Srs1, Srs2, Srs3), 1:3)
leaflet(height = "300px") %>% addPolygons(data = SpP)library(maps)
mapStates = map("state", fill = TRUE, plot = FALSE)
leaflet(data = mapStates) %>% addTiles() %>%
addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)m <- leaflet() %>% setView(lng = -71.0589, lat = 42.3601, zoom = 12)
m %>% addTiles()m %>% addProviderTiles("Stamen.Toner")m %>% addProviderTiles("CartoDB.Positron")m %>% addProviderTiles("Esri.NatGeoWorldMap")m %>% addProviderTiles("OpenTopoMap")m %>% addProviderTiles("Thunderforest.OpenCycleMap")leaflet() %>% addTiles() %>% setView(-93.65, 42.0285, zoom = 4) %>%
addWMSTiles(
"http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi",
layers = "nexrad-n0r-900913",
options = WMSTileOptions(format = "image/png", transparent = TRUE),
attribution = "Weather data © 2012 IEM Nexrad"
)m %>% addProviderTiles("MtbMap") %>%
addProviderTiles("Stamen.TonerLines",
options = providerTileOptions(opacity = 0.35)) %>%
addProviderTiles("Stamen.TonerLabels")greenLeafIcon <- makeIcon(
iconUrl = "http://leafletjs.com/examples/custom-icons/leaf-green.png",
iconWidth = 38, iconHeight = 95,
iconAnchorX = 22, iconAnchorY = 94,
shadowUrl = "http://leafletjs.com/examples/custom-icons/leaf-shadow.png",
shadowWidth = 50, shadowHeight = 64,
shadowAnchorX = 4, shadowAnchorY = 62
)
leaflet(data = quakes[1:4,]) %>% addTiles() %>%
addMarkers(~long, ~lat, icon = greenLeafIcon)leaflet(quakes) %>% addTiles() %>% addMarkers(
clusterOptions = markerClusterOptions()
)leaflet() %>% addTiles() %>%
addRectangles(
lng1=-118.456554, lat1=34.078039,
lng2=-118.436383, lat2=34.062717,
fillColor = "transparent"
)install.packages('DT')library('DT')exdat <- read.csv("data/exdat.csv")datatable(exdat)Hier ist das Ergebnis - Beispiel für eine interaktive Tabelle
datatable(head(exdat, 20), options = list(
columnDefs = list(list(className = 'dt-center', targets = 5)),
pageLength = 5,
lengthMenu = c(5, 10, 15, 20)
))datatable(exdat, options = list(searchHighlight = TRUE), filter = 'top')D3 ist eine der mächtigsten unter der Vielzahl aktuell verfügbarer JavaScript-Bibliotheken zur Datenvisualisierung.
D3, ggplot2 und RStudioinstall.packages("ggvis")library("ggvis")
library(dplyr)mtcars %>% ggvis(~wt, ~mpg) %>% layer_points()mtcars %>%
ggvis(~wt, ~mpg, fill = ~factor(cyl)) %>%
layer_points() %>%
group_by(cyl) %>%
layer_model_predictions(model = "lm")ggvismtcars %>%
ggvis(~wt, ~mpg) %>%
layer_smooths(span = input_slider(0.5, 1, value = 1)) %>%
layer_points(size := input_slider(100, 1000, value = 100))install.packages("googleVis")library(googleVis)df <- data.frame(year=1:11, x=1:11,
x.scope=c(rep(TRUE, 8), rep(FALSE, 3)),
y=11:1, y.html.tooltip=LETTERS[11:1],
y.certainty=c(rep(TRUE, 5), rep(FALSE, 6)),
y.emphasis=c(rep(FALSE, 4), rep(TRUE, 7)))plot(
gvisScatterChart(df,options=list(lineWidth=2))
)install.packages("devtools")
library(devtools)
install_github("clickme", "nachocab")library(clickme)
# simple
clickme("points", 1:10)
# fancy
n <- 500
clickme("points",
x = rbeta(n, 1, 10), y = rbeta(n, 1, 10),
names = sample(letters, n, r = T),
color_groups = sample(LETTERS[1:3], n, r = T),
title = "Zoom Search Hover Click")install.packages("d3Network")library(d3Network)
Source <- c("A", "A", "A", "A", "B", "B", "C", "C", "D")
Target <- c("B", "C", "D", "J", "E", "F", "G", "H", "I")
NetworkData <- data.frame(Source, Target)
d3SimpleNetwork(NetworkData, width = 400, height = 250)##
## <!DOCTYPE html>
## <meta charset="utf-8">
## <body>
## <style>
## .link {
## stroke: #666;
## opacity: 0.6;
## stroke-width: 1.5px;
## }
## .node circle {
## stroke: #fff;
## opacity: 0.6;
## stroke-width: 1.5px;
## }
## text {
## font: 7px serif;
## opacity: 0.6;
## pointer-events: none;
## }
## </style>
##
## <script src=http://d3js.org/d3.v3.min.js></script>
##
## <script>
## var links = [ { "source" : "A", "target" : "B" }, { "source" : "A", "target" : "C" }, { "source" : "A", "target" : "D" }, { "source" : "A", "target" : "J" }, { "source" : "B", "target" : "E" }, { "source" : "B", "target" : "F" }, { "source" : "C", "target" : "G" }, { "source" : "C", "target" : "H" }, { "source" : "D", "target" : "I" } ] ;
## var nodes = {}
##
## // Compute the distinct nodes from the links.
## links.forEach(function(link) {
## link.source = nodes[link.source] ||
## (nodes[link.source] = {name: link.source});
## link.target = nodes[link.target] ||
## (nodes[link.target] = {name: link.target});
## link.value = +link.value;
## });
##
## var width = 400
## height = 250;
##
## var force = d3.layout.force()
## .nodes(d3.values(nodes))
## .links(links)
## .size([width, height])
## .linkDistance(50)
## .charge(-200)
## .on("tick", tick)
## .start();
##
## var svg = d3.select("body").append("svg")
## .attr("width", width)
## .attr("height", height);
##
## var link = svg.selectAll(".link")
## .data(force.links())
## .enter().append("line")
## .attr("class", "link");
##
## var node = svg.selectAll(".node")
## .data(force.nodes())
## .enter().append("g")
## .attr("class", "node")
## .on("mouseover", mouseover)
## .on("mouseout", mouseout)
## .on("click", click)
## .on("dblclick", dblclick)
## .call(force.drag);
##
## node.append("circle")
## .attr("r", 8)
## .style("fill", "#3182bd");
##
## node.append("text")
## .attr("x", 12)
## .attr("dy", ".35em")
## .style("fill", "#3182bd")
## .text(function(d) { return d.name; });
##
## function tick() {
## link
## .attr("x1", function(d) { return d.source.x; })
## .attr("y1", function(d) { return d.source.y; })
## .attr("x2", function(d) { return d.target.x; })
## .attr("y2", function(d) { return d.target.y; });
##
## node.attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; });
## }
##
## function mouseover() {
## d3.select(this).select("circle").transition()
## .duration(750)
## .attr("r", 16);
## }
##
## function mouseout() {
## d3.select(this).select("circle").transition()
## .duration(750)
## .attr("r", 8);
## }
## // action to take on mouse click
## function click() {
## d3.select(this).select("text").transition()
## .duration(750)
## .attr("x", 22)
## .style("stroke-width", ".5px")
## .style("opacity", 1)
## .style("fill", "#E34A33")
## .style("font", "17.5px serif");
## d3.select(this).select("circle").transition()
## .duration(750)
## .style("fill", "#E34A33")
## .attr("r", 16)
## }
##
## // action to take on mouse double click
## function dblclick() {
## d3.select(this).select("circle").transition()
## .duration(750)
## .attr("r", 6)
## .style("fill", "#E34A33");
## d3.select(this).select("text").transition()
## .duration(750)
## .attr("x", 12)
## .style("stroke", "none")
## .style("fill", "#E34A33")
## .style("stroke", "none")
## .style("opacity", 0.6)
## .style("font", "7px serif");
## }
##
## </script>
## </body>
sink("FirstNetwork.js")
writeLines(d3SimpleNetwork(NetworkData), fileConn)
unlink("FirstNetwork.js")library(dygraphs)
dygraph(nhtemp, main = "New Haven Temperatures") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))library(googleVis)
op <- options(gvis.plot.tag = "chart")
## Add the mean
CityPopularity$Mean=mean(CityPopularity$Popularity)
CC <- gvisComboChart(CityPopularity, xvar='City',
yvar=c('Mean', 'Popularity'),
options=list(seriesType='bars',
width=450, height=300,
title='City Popularity',
series='{0: {type:\"line\"}}'))
plot(CC)## <!-- ComboChart generated in R 3.3.3 by googleVis 0.6.2 package -->
## <!-- Sat May 06 22:12:05 2017 -->
##
##
## <!-- jsHeader -->
## <script type="text/javascript">
##
## // jsData
## function gvisDataComboChartID15b0312d4d6b () {
## var data = new google.visualization.DataTable();
## var datajson =
## [
## [
## "New York",
## 450,
## 200
## ],
## [
## "Boston",
## 450,
## 300
## ],
## [
## "Miami",
## 450,
## 400
## ],
## [
## "Chicago",
## 450,
## 500
## ],
## [
## "Los Angeles",
## 450,
## 600
## ],
## [
## "Houston",
## 450,
## 700
## ]
## ];
## data.addColumn('string','City');
## data.addColumn('number','Mean');
## data.addColumn('number','Popularity');
## data.addRows(datajson);
## return(data);
## }
##
## // jsDrawChart
## function drawChartComboChartID15b0312d4d6b() {
## var data = gvisDataComboChartID15b0312d4d6b();
## var options = {};
## options["allowHtml"] = true;
## options["seriesType"] = "bars";
## options["width"] = 450;
## options["height"] = 300;
## options["title"] = "City Popularity";
## options["series"] = {0: {type:"line"}};
##
##
## var chart = new google.visualization.ComboChart(
## document.getElementById('ComboChartID15b0312d4d6b')
## );
## chart.draw(data,options);
##
##
## }
##
##
## // jsDisplayChart
## (function() {
## var pkgs = window.__gvisPackages = window.__gvisPackages || [];
## var callbacks = window.__gvisCallbacks = window.__gvisCallbacks || [];
## var chartid = "corechart";
##
## // Manually see if chartid is in pkgs (not all browsers support Array.indexOf)
## var i, newPackage = true;
## for (i = 0; newPackage && i < pkgs.length; i++) {
## if (pkgs[i] === chartid)
## newPackage = false;
## }
## if (newPackage)
## pkgs.push(chartid);
##
## // Add the drawChart function to the global list of callbacks
## callbacks.push(drawChartComboChartID15b0312d4d6b);
## })();
## function displayChartComboChartID15b0312d4d6b() {
## var pkgs = window.__gvisPackages = window.__gvisPackages || [];
## var callbacks = window.__gvisCallbacks = window.__gvisCallbacks || [];
## window.clearTimeout(window.__gvisLoad);
## // The timeout is set to 100 because otherwise the container div we are
## // targeting might not be part of the document yet
## window.__gvisLoad = setTimeout(function() {
## var pkgCount = pkgs.length;
## google.load("visualization", "1", { packages:pkgs, callback: function() {
## if (pkgCount != pkgs.length) {
## // Race condition where another setTimeout call snuck in after us; if
## // that call added a package, we must not shift its callback
## return;
## }
## while (callbacks.length > 0)
## callbacks.shift()();
## } });
## }, 100);
## }
##
## // jsFooter
## </script>
##
## <!-- jsChart -->
## <script type="text/javascript" src="https://www.google.com/jsapi?callback=displayChartComboChartID15b0312d4d6b"></script>
##
## <!-- divChart -->
##
## <div id="ComboChartID15b0312d4d6b"
## style="width: 450; height: 300;">
## </div>
install.packages("threejs")# install.packages("threejs")
library(threejs)
z <- seq(-10, 10, 0.01)
x <- cos(z)
y <- sin(z)
scatterplot3js(x,y,z, color=rainbow(length(z)))Rook - Tools um Webapplikationen mit R zu erstellen
install.packages("Rook")plotly Installiereninstall.packages("plotly")library("plotly")plotly für Rp <- plot_ly(midwest, x = ~percollege, color = ~state, type = "box")
purl <- "https://raw.githubusercontent.com/Japhilko/GeoData/master/2015/data/whcSites.csv"
whcSites <- read.csv(url) p <- plot_ly(whcSites, x = ~date_inscribed, color = ~category_short, type = "box")
p# install.packages("visNetwork")
library(visNetwork)nodes <- data.frame(id = 1:3)
edges <- data.frame(from = c(1,2), to = c(1,3))
visNetwork(nodes, edges, width = "100%")visDocumentation()
vignette("Introduction-to-visNetwork") # with CRAN versionshiny::runApp(system.file("shiny", package = "visNetwork"))install.packages('DiagrammeR')library('DiagrammeR')DiagrammeR("
graph LR
A-->B
A-->C
C-->E
B-->D
C-->D
D-->F
E-->F
")DiagrammeR("
gantt
dateFormat YYYY-MM-DD
title Adding GANTT diagram functionality to mermaid
section A section
Completed task :done, des1, 2014-01-06,2014-01-08
Active task :active, des2, 2014-01-09, 3d
Future task : des3, after des2, 5d
Future task2 : des4, after des3, 5d
section Critical tasks
Completed task in the critical line :crit, done, 2014-01-06,24h
Implement parser and jison :crit, done, after des1, 2d
Create tests for parser :crit, active, 3d
Future task in critical line :crit, 5d
Create tests for renderer :2d
Add to mermaid :1d
")library(DiagrammeR)
mermaid("
gantt
dateFormat YYYY-MM-DD
title A Very Nice Gantt Diagram
section Basic Tasks
This is completed :done, first_1, 2014-01-06, 2014-01-08
This is active :active, first_2, 2014-01-09, 3d
Do this later : first_3, after first_2, 5d
Do this after that : first_4, after first_3, 5d
section Important Things
Completed, critical task :crit, done, import_1, 2014-01-06,24h
Also done, also critical :crit, done, import_2, after import_1, 2d
Doing this important task now :crit, active, import_3, after import_2, 3d
Next critical task :crit, import_4, after import_3, 5d
section The Extras
First extras :active, extras_1, after import_4, 3d
Second helping : extras_2, after extras_1, 20h
More of the extras : extras_3, after extras_1, 48h
").xlsx, .csv, .dta oder ähnliches abgespeichert sondern in einem der folgenden Formate: .json, .xml etc.Die Struktur der Daten kann man sich mit einem JSON Viewer anschauen
jsonliteinstall.packages("jsonlite")library(jsonlite)
citation("jsonlite")##
## To cite jsonlite in publications use:
##
## Jeroen Ooms (2014). The jsonlite Package: A Practical and
## Consistent Mapping Between JSON Data and R Objects.
## arXiv:1403.2805 [stat.CO] URL http://arxiv.org/abs/1403.2805.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects},
## author = {Jeroen Ooms},
## journal = {arXiv:1403.2805 [stat.CO]},
## year = {2014},
## url = {http://arxiv.org/abs/1403.2805},
## }
library("jsonlite")
DRINKWATER <- fromJSON("data/RomDrinkingWater.geojson")names(DRINKWATER)[1:3]## [1] "type" "generator" "copyright"
names(DRINKWATER)[4:5]## [1] "timestamp" "features"
head(DRINKWATER$features)## type id properties.@id properties.amenity properties.flow
## 1 Feature node/246574149 node/246574149 drinking_water push-button
## 2 Feature node/246574150 node/246574150 drinking_water <NA>
## 3 Feature node/246574151 node/246574151 drinking_water <NA>
## 4 Feature node/248743324 node/248743324 drinking_water <NA>
## 5 Feature node/251773348 node/251773348 drinking_water <NA>
## 6 Feature node/251773551 node/251773551 drinking_water <NA>
## properties.type properties.name properties.name:fr properties.wheelchair
## 1 nasone <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 nasone <NA> <NA> <NA>
## 6 <NA> Acqua Marcia Eau potable yes
## properties.created_by properties.indoor geometry.type
## 1 <NA> <NA> Point
## 2 <NA> <NA> Point
## 3 <NA> <NA> Point
## 4 <NA> <NA> Point
## 5 <NA> <NA> Point
## 6 <NA> <NA> Point
## geometry.coordinates
## 1 12.49191, 41.89479
## 2 12.49095, 41.89489
## 3 12.48774, 41.89450
## 4 12.48773, 41.89354
## 5 12.48529, 41.88539
## 6 12.48386, 41.89332
my_repos <- fromJSON("https://api.github.com/users/japhilko/repos")names(my_repos)## [1] "id" "name" "full_name"
## [4] "owner" "private" "html_url"
## [7] "description" "fork" "url"
## [10] "forks_url" "keys_url" "collaborators_url"
## [13] "teams_url" "hooks_url" "issue_events_url"
## [16] "events_url" "assignees_url" "branches_url"
## [19] "tags_url" "blobs_url" "git_tags_url"
## [22] "git_refs_url" "trees_url" "statuses_url"
## [25] "languages_url" "stargazers_url" "contributors_url"
## [28] "subscribers_url" "subscription_url" "commits_url"
## [31] "git_commits_url" "comments_url" "issue_comment_url"
## [34] "contents_url" "compare_url" "merges_url"
## [37] "archive_url" "downloads_url" "issues_url"
## [40] "pulls_url" "milestones_url" "notifications_url"
## [43] "labels_url" "releases_url" "deployments_url"
## [46] "created_at" "updated_at" "pushed_at"
## [49] "git_url" "ssh_url" "clone_url"
## [52] "svn_url" "homepage" "size"
## [55] "stargazers_count" "watchers_count" "language"
## [58] "has_issues" "has_projects" "has_downloads"
## [61] "has_wiki" "has_pages" "forks_count"
## [64] "mirror_url" "open_issues_count" "forks"
## [67] "open_issues" "watchers" "default_branch"
library(jsonlite)
res <- fromJSON('http://ergast.com/api/f1/2004/1/results.json')
drivers <- res$MRData$RaceTable$Races$Results[[1]]$Driver
colnames(drivers)## [1] "driverId" "code" "url" "givenName"
## [5] "familyName" "dateOfBirth" "nationality" "permanentNumber"
article_key <- "&api-key=c2fede7bd9aea57c898f538e5ec0a1ee:6:68700045"
url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=obamacare+socialism"
req <- fromJSON(paste0(url, article_key))
articles <- req$response$docs
colnames(articles)## [1] "web_url" "snippet" "lead_paragraph"
## [4] "abstract" "print_page" "blog"
## [7] "source" "multimedia" "headline"
## [10] "keywords" "pub_date" "document_type"
## [13] "news_desk" "section_name" "subsection_name"
## [16] "byline" "type_of_material" "_id"
## [19] "word_count" "slideshow_credits"
XML Paketlibrary(XML)
citation("XML")##
## To cite package 'XML' in publications use:
##
## Duncan Temple Lang and the CRAN Team (2016). XML: Tools for
## Parsing and Generating XML Within R and S-Plus. R package
## version 3.98-1.5. https://CRAN.R-project.org/package=XML
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {XML: Tools for Parsing and Generating XML Within R and S-Plus},
## author = {Duncan Temple Lang and the CRAN Team},
## year = {2016},
## note = {R package version 3.98-1.5},
## url = {https://CRAN.R-project.org/package=XML},
## }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
xml2 Paketinstall.packages("xml2")library(xml2)
citation("xml2")##
## To cite package 'xml2' in publications use:
##
## Hadley Wickham and James Hester (2016). xml2: Parse XML. R
## package version 1.0.0. https://CRAN.R-project.org/package=xml2
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {xml2: Parse XML},
## author = {Hadley Wickham and James Hester},
## year = {2016},
## note = {R package version 1.0.0},
## url = {https://CRAN.R-project.org/package=xml2},
## }
url <- "http://api.openstreetmap.org/api/0.6/
relation/62422"library(xml2)
BE <- xmlParse(url)Administrative Grenzen Berlin
xmltop = xmlRoot(BE)
class(xmltop)## [1] "XMLInternalElementNode" "XMLInternalNode"
## [3] "XMLAbstractNode"
xmlSize(xmltop)## [1] 1
xmlSize(xmltop[[1]])## [1] 328
Xpath, the XML Path Language, is a query language for selecting nodes from an XML document.
xpathApply(BE,"//tag[@k = 'source:population']")## [[1]]
## <tag k="source:population" v="http://www.statistik-berlin-brandenburg.de/Publikationen/Stat_Berichte/2010/SB_A1-1_A2-4_q01-10_BE.pdf 2010-10-01"/>
##
## attr(,"class")
## [1] "XMLNodeSet"
url2 <- "http://api.openstreetmap.org/api/0.6/node/2923760808"
RennesBa <- xmlParse(url2)url3 <- "http://api.openstreetmap.org/api/0.6/way/72799743"
MadCalle <- xmlParse(url3)Logo Overpass API
The Overpass API is a read-only API that serves up custom selected parts of the OSM map data.
(http://wiki.openstreetmap.org/wiki/Overpass_API)
http://wiki.openstreetmap.org/wiki/Map_Features
osm map features
Spielplätze Mannheim
Export Rohdaten
Link1 <- "http://www.overpass-api.de/api/interpreter?
data=[maxsize:1073741824][timeout:900];area[name=\""library(XML)
place <- "Mannheim"
type_obj <- "node"
object <- "leisure=playground"
InfoList <- xmlParse(paste(Link1,place,"\"];",
type_obj,"(area)[",object,"];out;",sep=""))Spielplätze in Mannheim
Die Liste der ID’s mit dem Wert playground:
node_id <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ id")
## node_id[[1]]Erste node id
lat_x <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ lat")
# lat_x[[1]];lat_x[[2]]lat_x <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ lon")Latitude Koordinate
library(devtools)
install_github("Japhilko/gosmd")library(gosmd)
pg_MA <- get_osm_nodes(object="leisure=playground",
"Mannheim")
info <- extract_osm_nodes(OSM.Data=pg_MA,
value="playground")| leisure | lat | lon | note | |
|---|---|---|---|---|
| 30560755 | playground | 49.51910 | 8.502807 | NA |
| 76468450 | playground | 49.49633 | 8.539396 | Rutsche, Schaukel, großer Sandkasten, Tischtennis |
| 76468534 | playground | 49.49678 | 8.552959 | NA |
| 76468535 | playground | 49.49230 | 8.548750 | NA |
| 76468536 | playground | 49.50243 | 8.548140 | Schaukel, Rutsche, Sandkasten, Spielhäuser, Tischtennis |
| 76468558 | playground | 49.49759 | 8.542036 | NA |
http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf
http://www.omegahat.net/RSXML/shortIntro.pdf
http://www.di.fc.ul.pt/~jpn/r/web/index.html#parsing-xml
http://www.w3schools.com/xml/xquery_intro.asp
http://giventhedata.blogspot.de/2012/06/r-and-web-for-beginners-part-ii-xml-in.html
http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf
XML - Gaston Sanchezlibrary("XML")Gaston Sanchez - Dataflow
Seine Arbeit sieht man hier.
Gaston Sanchez - Webdaten bekommen
| Function | Description |
|---|---|
| xmlName() | name of the node |
| xmlSize() | number of subnodes |
| xmlAttrs() | named character vector of all attributes |
| xmlGetAttr() | value of a single attribute |
| xmlValue() | contents of a leaf node |
| xmlParent() | name of parent node |
| xmlAncestors() | name of ancestor nodes |
| getSibling() | siblings to the right or to the left |
| xmlNamespace() | the namespace (if there’s one) |
<www.openstreetmap.org/export>
osm export
Administrative Grenzen für Deutschland
url <- "http://api.openstreetmap.org/api/0.6/relation/62422"BE <- xmlParse(url)Administrative Grenzen Berlin
xmltop = xmlRoot(BE)
class(xmltop)## [1] "XMLInternalElementNode" "XMLInternalNode"
## [3] "XMLAbstractNode"
xmlSize(xmltop)## [1] 1
xmlSize(xmltop[[1]])## [1] 328
Xpath, the XML Path Language, is a query language for selecting nodes from an XML document.
xpathApply(BE,"//tag[@k = 'population']")## [[1]]
## <tag k="population" v="3440441"/>
##
## attr(,"class")
## [1] "XMLNodeSet"
xpathApply(BE,"//tag[@k = 'source:population']")## [[1]]
## <tag k="source:population" v="http://www.statistik-berlin-brandenburg.de/Publikationen/Stat_Berichte/2010/SB_A1-1_A2-4_q01-10_BE.pdf 2010-10-01"/>
##
## attr(,"class")
## [1] "XMLNodeSet"
xpathApply(BE,"//tag[@k = 'name:ta']")## [[1]]
## <tag k="name:ta" v="<U+0BAA><U+0BC6><U+0BB0><U+0BCD><U+0BB2><U+0BBF><U+0BA9><U+0BCD>"/>
##
## attr(,"class")
## [1] "XMLNodeSet"
region <- xpathApply(BE,
"//tag[@k = 'geographical_region']")
# regular expressions
region[[1]]## <tag k="geographical_region" v="Barnim;Berliner Urstromtal;Teltow;Nauener Platte"/>
<tag k="geographical_region"
v="Barnim;Berliner Urstromtal;
Teltow;Nauener Platte"/>
Barnim
url2<-"http://api.openstreetmap.org/api/0.6/node/25113879"
obj2<-xmlParse(url2)
obj_amenity<-xpathApply(obj2,"//tag[@k = 'amenity']")[[1]]
obj_amenity## <tag k="amenity" v="university"/>
xpathApply(obj2,"//tag[@k = 'wikipedia']")[[1]]## <tag k="wikipedia" v="de:Universität Mannheim"/>
xpathApply(obj2,"//tag[@k = 'wheelchair']")[[1]]xpathApply(obj2,"//tag[@k = 'name']")[[1]]url3<-"http://api.openstreetmap.org/api/0.6/node/303550876"
obj3 <- xmlParse(url3)
xpathApply(obj3,"//tag[@k = 'opening_hours']")[[1]]## <tag k="opening_hours" v="Mo-Sa 09:00-20:00; Su,PH off"/>
url4<-"http://api.openstreetmap.org/api/0.6/node/25439439"
obj4 <- xmlParse(url4)
xpathApply(obj4,"//tag[@k = 'railway:station_category']")[[1]]## <tag k="railway:station_category" v="2"/>
library(rvest)
bhfkat<-read_html(
"https://de.wikipedia.org/wiki/Bahnhofskategorie")
df_html_bhfkat<-html_table(
html_nodes(bhfkat, "table")[[1]],fill = TRUE)| Stufe | Bahnsteigkanten | Bahnsteiglänge | Reisende/Tag | Zughalte/Tag |
|---|---|---|---|---|
| 6 | 01 | > 000 bis 090 m | 00000 bis 00049 | 000 bis 0010 |
| 5 | 02 | > 090 bis 140 m | 00050 bis 00299 | 011 bis 0050 |
| 4 | 03 bis 04 | > 140 bis 170 m | 00300 bis 00999 | 051 bis 0100 |
| 3 | 05 bis 09 | > 170 bis 210 m | 01000 bis 09999 | 101 bis 0500 |
| 2 | 10 bis 14 | > 210 bis 280 m | 10.000 bis 49.999 | 501 bis 1000 |
| 1 | 00i ab 15 | > 280 m | 00000i ab 50.000 | 000i ab 1001 |
url5<-"http://api.openstreetmap.org/api/0.6/way/162149882"
obj5<-xmlParse(url5)
xpathApply(obj5,"//tag[@k = 'name']")[[1]]## <tag k="name" v="City-Airport Mannheim"/>
xpathApply(obj5,"//tag[@k = 'website']")[[1]]## <tag k="website" v="http://www.flugplatz-mannheim.de/"/>
xpathApply(obj5,"//tag[@k = 'iata']")[[1]]## <tag k="iata" v="MHG"/>
<www.openstreetmap.org/export>
osm export
Deborah Nolan - Extracting data from XML
Duncan Temple Lang - A Short Introduction to the XML package for R
Noch mehr Informationen
rvestlibrary(rvest)
ht <- read_html('https://www.google.co.in/search?q=guitar+repair+workshop')
links <- ht %>% html_nodes(xpath='//h3/a') %>% html_attr('href')
gsub('/url\\?q=','',sapply(strsplit(links[as.vector(grep('url',links))],split='&'),'[',1))## [1] "http://theguitarrepairworkshop.com/"
## [2] "http://www.guitarservices.com/"
## [3] "http://www.guitarrepairbench.com/guitar-building-projects/guitar-workshop/guitar-workshop-project.html"
## [4] "https://www.facebook.com/The-Guitar-Repair-Workshop-847517635259712/"
## [5] "https://www.taylorguitars.com/dealer/guitar-repair-workshop-ltd"
## [6] "http://www.laweekly.com/music/10-best-guitar-repair-shops-in-los-angeles-4647166"
## [7] "https://www.justdial.com/Mumbai/Guitar-Repair-Services/nct-10988623"
## [8] "https://www.justdial.com/Delhi-NCR/Guitar-Repair-Services/nct-10988623"
## [9] "http://guitarworkshopglasgow.com/pages/repairs-1"
## [10] "http://www.google.co.in/aclk?sa=l"
install.packages("tidyverse")library(tidyverse)
library(stringr)
library(forcats)
library(ggmap)
library(rvest)html.world_ports <- read_html("https://en.wikipedia.org/wiki/List_of_busiest_container_ports")
df.world_ports <- html_table(html_nodes(html.world_ports, "table")[[2]], fill = TRUE)glimpse(df.world_ports)## Observations: 50
## Variables: 15
## $ Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ Port <chr> "Shanghai", "Singapore", "Shenzhen", "Ningbo-Zhoushan...
## $ Economy <chr> "China", "Singapore", "China", "China", "Hong Kong", ...
## $ 2015[1] <chr> "36,516", "30,922", "24,142", "20,636", "20,073", "19...
## $ 2014[2] <chr> "35,268", "33,869", "23,798", "19,450", "22,374", "18...
## $ 2013[3] <chr> "33,617", "32,240", "23,280", "17,351", "22,352", "17...
## $ 2012[4] <chr> "32,529", "31,649", "22,940", "16,670", "23,117", "17...
## $ 2011[5] <chr> "31,700", "29,937", "22,570", "14,686", "24,384", "16...
## $ 2010[6] <chr> "29,069", "28,431", "22,510", "13,144", "23,532", "14...
## $ 2009[7] <chr> "25,002", "25,866", "18,250", "10,502", "20,983", "11...
## $ 2008[8] <chr> "27,980", "29,918", "21,414", "11,226", "24,248", "13...
## $ 2007[9] <chr> "26,150", "27,932", "21,099", "9,349", "23,881", "13,...
## $ 2006[10] <chr> "21,710", "24,792", "18,469", "7,068", "23,539", "12,...
## $ 2005[11] <chr> "18,084", "23,192", "16,197", "5,208", "22,427", "11,...
## $ 2004[12] <chr> "14,557", "21,329", "13,615", "4,006", "21,984", "11,...
rvestlibrary(rvest)
ht <- read_html('https://www.google.co.in/search?q=guitar+repair+workshop')
links <- ht %>% html_nodes(xpath='//h3/a') %>% html_attr('href')
gsub('/url\\?q=','',sapply(strsplit(links[as.vector(grep('url',links))],split='&'),'[',1))## [1] "http://theguitarrepairworkshop.com/"
## [2] "http://www.guitarservices.com/"
## [3] "http://www.guitarrepairbench.com/guitar-building-projects/guitar-workshop/guitar-workshop-project.html"
## [4] "https://www.facebook.com/The-Guitar-Repair-Workshop-847517635259712/"
## [5] "https://www.taylorguitars.com/dealer/guitar-repair-workshop-ltd"
## [6] "http://www.laweekly.com/music/10-best-guitar-repair-shops-in-los-angeles-4647166"
## [7] "https://www.justdial.com/Mumbai/Guitar-Repair-Services/nct-10988623"
## [8] "https://www.justdial.com/Delhi-NCR/Guitar-Repair-Services/nct-10988623"
## [9] "http://guitarworkshopglasgow.com/pages/repairs-1"
## [10] "http://www.google.co.in/aclk?sa=l"
Im Folgenden werde ich zeigen, wie man Textinformationen aus Wikipedia herunterladen, verarbeiten und analysieren kann.
install.packages("NLP")
install.packages("tm")
install.packages("FactoMineR")stringi von Marek Gagolewski and Bartek Tartanus bietet Möglichkeiten zur String Verarbeitung.library("stringi")tm ist ein R-Paket um Text Mining zu realisieren. Es wurde von Ingo Feinerer, Kurt Hornik, und David Meyer geschrieben.library("tm")FactoMineR-Paket, das von Sebastien Le, Julie Josse und Francois Husson zur Durchführung der Hauptkomponentenanalyse erstellt wurde.library("FactoMineR")wiki <- "http://de.wikipedia.org/wiki/"
titles <- c("Zika-Virus", "Influenza-A-Virus_H1N1",
"Spanische_Grippe","Influenzavirus",
"Vogelgrippe_H5N1",
"Legionellose-Ausbruch_in_Warstein_2013",
"Legionellose-Ausbruch_in_Jülich_2014")articles <- character(length(titles))
for (i in 1:length(titles)){
articles[i] <- stri_flatten(
readLines(stri_paste(wiki, titles[i])), col = " ")
}
docs <- Corpus(VectorSource(articles))Das Folgende basiert auf einem Blogpost von Norbert Ryciak über die automatische Kategorisierung von Wikipedia-Artikeln.
docs2 <- tm_map(docs, function(x) stri_replace_all_regex(
x, "<.+?>", " "))
docs3 <- tm_map(docs2, function(x) stri_replace_all_fixed(
x, "\t", " "))docs4 <- tm_map(docs3, PlainTextDocument)
docs5 <- tm_map(docs4, stripWhitespace)
docs6 <- tm_map(docs5, removeWords, stopwords("german"))
docs7 <- tm_map(docs6, removePunctuation)
docs8 <- tm_map(docs7, tolower)
# docs8 <- tm_map(docs8, PlainTextDocument)dtm <- DocumentTermMatrix(docs8) dtm2 <- as.matrix(dtm)
frequency <- colSums(dtm2)
frequency <- sort(frequency, decreasing=TRUE)
words <- frequency[frequency>20]
s <- dtm2[1,which(colnames(dtm2) %in% names(words))]
for(i in 2:nrow(dtm2)){
s <- cbind(s,dtm2[i,which(colnames(dtm2) %in%
names(words))])
}
colnames(s) <- titlesPCA(s)## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 125 individuals, described by 7 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
s0 <- s/apply(s,1,sd)
h <- hclust(dist(t(s0)), method = "ward")
plot(h, labels = titles, sub = "")git commit
git push
http://stackoverflow.com/questions/1125968/force-git-to-overwrite-local-files-on-pull
WinDirStat https://support.microsoft.com/de-de/kb/912997 http://www.pcwelt.de/tipps/Update-Dateien-loeschen-8357046.html
install.packages("devtools")
library(devtools)
install_github("Japhilko/gosmd")Robert Gentleman, in R Programming for Bioinformatics, 2008, about R’s built-in C interfaces:
Since R is not compiled, in some situations its performance can be substantially improved by writing code in a compiled language. There are also reasons not to write code in other languages, and in particular we caution against premature optimization, prototyping in R is often cost effective. And in our experience very few routines need to be implemented in other languages for effiiency reasons. Another substantial reason not to use an implementation in some other language is increased complexity. The use of another language almost always results in higher maintenance costs and less stability. In addition, any extensions or enhancements of the code will require someone that is proficient in both R and the other language.
Warum? - R wird langsam oder hat Probleme bei der Speicherverwaltung: zum Beispiel bei Schleifen, die nicht vektorisiert werden können.
Wann? - wenn man es mit Rcode nicht besser hinbekommt und man den langsamen Code identifiziert hat.
Für Windows, Rtools
Für Mac, Xcode
Wir werden die folgenden beiden Pakete nutzen:
inline und die cfunction um Inline C code zu schreiben, der on-the-fly kompiliert wird (Es gibt auch eine cxxfunction für C++ Code).
Rcpp, und die Nutzung der Funktion cppFunction
install.packages("Rcpp")library(Rcpp)
cppFunction('int add(int x, int y, int z) {
int sum = x + y + z;
return sum;
}')
# add works like a regular R function
addadd(1, 2, 3)Tutorial on Rcpp by Hadley Wickham
library(Rcpp)cppFunction('int add(int x, int y, int z) {
int sum = x + y + z;
return sum;
}')add(1, 2, 3)install.packages("microbenchmark")library(microbenchmark)Oliver Heidmann - Programmieren in R - Rcpp
Man nutzt die Schnittstelle zu Datenbanken,…
Key-Value-Stores (bspw. CouchDB, MongoDB) und Speicherung unstrukturierter Daten wird durch Schema Evolution ermöglicht
mit NoSQL lassen sich deutlich gößere Datenmengen händeln
Horizontale Skalierbarkeit - wichtig bei Daten wie Video, Audio oder Bild-Dateien
NoSQL-Bewegung ist nicht proprietär an einen Hersteller gebunden
sofa verwendet werdendplyrdplyrinstall.packages("nycflights13")library(nycflights13)
dim(flights)## [1] 336776 19
head(flights)## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <int> <int> <dbl> <int>
## 1 2013 1 1 517 515 2 830
## 2 2013 1 1 533 529 4 850
## 3 2013 1 1 542 540 2 923
## 4 2013 1 1 544 545 -1 1004
## 5 2013 1 1 554 600 -6 812
## 6 2013 1 1 554 558 -4 740
## # ... with 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## # time_hour <dttm>
filter()library(dplyr)
head(filter(flights, month == 1,day==1))## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <int> <int> <dbl> <int>
## 1 2013 1 1 517 515 2 830
## 2 2013 1 1 533 529 4 850
## 3 2013 1 1 542 540 2 923
## 4 2013 1 1 544 545 -1 1004
## 5 2013 1 1 554 600 -6 812
## 6 2013 1 1 554 558 -4 740
## # ... with 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## # time_hour <dttm>
dplyrinstall.packages("downloader")library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/msleep_ggplot2.csv"
filename <- "msleep_ggplot2.csv"
if (!file.exists(filename)) download(url,filename)
msleep <- read.csv("msleep_ggplot2.csv")
head(msleep)## name genus vore order conservation
## 1 Cheetah Acinonyx carni Carnivora lc
## 2 Owl monkey Aotus omni Primates <NA>
## 3 Mountain beaver Aplodontia herbi Rodentia nt
## 4 Greater short-tailed shrew Blarina omni Soricomorpha lc
## 5 Cow Bos herbi Artiodactyla domesticated
## 6 Three-toed sloth Bradypus herbi Pilosa <NA>
## sleep_total sleep_rem sleep_cycle awake brainwt bodywt
## 1 12.1 NA NA 11.9 NA 50.000
## 2 17.0 1.8 NA 7.0 0.01550 0.480
## 3 14.4 2.4 NA 9.6 NA 1.350
## 4 14.9 2.3 0.1333333 9.1 0.00029 0.019
## 5 4.0 0.7 0.6666667 20.0 0.42300 600.000
## 6 14.4 2.2 0.7666667 9.6 NA 3.850
sleepData <- select(msleep, name, sleep_total)
head(sleepData)## name sleep_total
## 1 Cheetah 12.1
## 2 Owl monkey 17.0
## 3 Mountain beaver 14.4
## 4 Greater short-tailed shrew 14.9
## 5 Cow 4.0
## 6 Three-toed sloth 14.4
RPostgreSQL
PostgreSQL
# install.packages("RPostgreSQL")
library("RPostgreSQL")sudo -u postgres createuser Japhilko
sudo -u postgres createdb -E UTF8 -O Japhilko offlgeoc
Die postgis Erweiterung muss für die Datenbank installiert werden:
CREATE EXTENSION postgis;
osm2pgsql -c -d osmBerlin --slim -C -k berlin-latest.osm.pbf
CREATE EXTENSION hstore;
osm2pgsql -s -U postgres -d offlgeoc /home/kolb/Forschung/osmData/data/saarland-latest.osm.pbf
sudo -u postgres createdb -E UTF8 -O Japhilko offlgeocRLP
CREATE EXTENSION postgis;
osm2pgsql -s -U postgres -d offlgeocRLP -o gazetteer /home/kolb/Forschung/osmData/data/rheinland-pfalz-latest.osm.pbf
So bekommt man alle administrativen Grenzen:
SELECT name FROM planet_osm_polygon WHERE boundary='administrative'
pw <- {"1234"}
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "offlgeocRLP",
host = "localhost", port = 5432,
user = "postgres", password = pw)
rm(pw) # removes the password
dbExistsTable(con, "planet_osm_polygon")df_postgres <- dbGetQuery(con, "SELECT name, admin_level FROM planet_osm_polygon WHERE boundary='administrative'")barplot(table(df_postgres[,2]),col="royalblue")df_adm8 <- dbGetQuery(con, "SELECT name, admin_level FROM planet_osm_polygon WHERE boundary='administrative' AND admin_level='8'")library(knitr)
# kable(head(df_adm8))df_hnr <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point
WHERE planet_osm_line.name='Nordring' AND planet_osm_line.highway IN ('motorway','trunk','primary')
AND planet_osm_point.name='Ludwigshafen' AND planet_osm_point.place IN ('city', 'town')
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")df_hnr <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point
WHERE planet_osm_line.name='Nordring' AND planet_osm_point.name='Ludwigshafen'
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_hnr)df_ <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point
WHERE planet_osm_line.name='Nordring' AND planet_osm_point.name='Ludwigshafen'
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_hnr)colnames(df_)table(df_$name)df_sipp <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point
WHERE planet_osm_line.name='Rechweg' AND planet_osm_point.name='Sippersfeld'
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_sipp)restnam <- dbGetQuery(con, "SELECT name, COUNT(osm_id) AS anzahl
FROM planet_osm_point
WHERE amenity = 'restaurant'
AND name <> ''
GROUP BY name
ORDER BY anzahl DESC
LIMIT 10")
head(restnam)install.packages("plot3D")library(plot3D)
library(RPostgreSQL)RMySQLinstall.packages("RMySQL")install.packages("mongolite")library(mongolite)
m <- mongo(collection = "diamonds")sofa verwendet werdeninstall.packages("jsonlite")
devtools::install_github("ropensci/sofa")library("sofa")